Stochastic Deconvolutional Neural Network Ensemble Training on Generative Pseudo-Adversarial Networks
نویسندگان
چکیده
The training of Generative Adversarial Networks (GANs) is a difficult task mainly due to the nature of the networks. One such issue is when the generator and discriminator start oscillating, rather than converging to a fixed point. Another case can be when one agent becomes more adept than the other which results in the decrease of the other agent’s ability to learn, reducing the learning capacity of the system as a whole. Additionally, there exists the problem of ‘Mode Collapse’ which involves the generators output collapsing to a single sample or a small set of similar samples. To train GANs a careful selection of the architecture that is used along with a variety of other methods to improve training. Even when applying these methods there is low stability of training in relation to the parameters that are chosen. Stochastic ensembling is suggested as a method for improving the stability while training GANs. Introduction Deep Networks have made great advances in the area of generative modes. These advances have been the result of a wide range of training losses and architectures, including but limited to Generative Adversarial Networks (GANs) [1]. Most deep generative models are trained by the use of two models. They are used to solve a minimax ‘game’, with a Generator sampling data, Discriminator classifying the data as real or generated. In theory these models are capable of modeling an arbitrarily complex probability distribution. The ability to train flexible generating functions have made GANs extremely successful in image generation [2]. In practice, however, GANs suffer from many issues, particularly during training. One common failure mode involves the generator collapsing to produce only a single sample or a small family of very similar samples. Another involves the generator and discriminator oscillating during training, rather than converging to a fixed point. In addition, if one agent becomes much more powerful than the other, the learning signal to the other agent becomes useless, and the system does not learn. A lot of attempts have been made to minimise the mode-collapse problem and improve variety on the output [3, 4, 5, 6]. However, some solutions are computationally expensive and treated mode-collapse problem symptomatically. The assumption behind the methodology described later is that the architecture of a typical GAN causes mode-collapse to occur. The Discriminator portion of the network constantly requires new samples from the Generator and due to how is defined, it never reaches a state for which the output of is satisfactory. This in turn results into two possible ways for the model to evolve. Firstly, in the case where is overly powerful the network can start oscillating. This is where even the slightest modification of parameters can result in significantly different outputs that the discriminator cannot “remember”. In this situation the output differs from epoch to epoch, at the cost of local variety inside of one epoch. We call this scenario the “soft-collapse” of a model. However, if is weaker the oscillation scenario cannot occur. In this instance a situation called “hard-collapse” may manifest. This is where after a small number of attempts to significantly modify the output and go into oscillation mode, it fails. The discriminator becomes absolute certain that the all samples are fake. This results with the loss of the generator being effectively infinite. This results in undefined gradients and it being impossible for the training to progress further. As we believe that mode-collapse is unavoidable situation another, synthetic way of solving this issue, is suggested. Proposed is the simple idea of Stochastic Ensembling, which can be described as random shuffling of filters on deep levels of the generator. This is comparable to creating a set of weak generators that can still suffer from the modecollapse problem, but still produce an acceptable output variety. The efficiency of the described method is demonstrated on another, Pseudo-GAN, where the role of discriminator is played by any pre-trained image classifier. This can be seen as a state of absolute mode-collapse from the beginning of training. Methodology The main difference between standard GAN architecture and one using Stochastic Ensembling is in the way that the deep layers are constructed within the Generator. In these layers, stochastic deconvolution is applied, the main idea of which is to randomly select a set of filters from a fixed filter bank. In this architecture a stochastic deconvolution layer is constructed using filters of size 4, applied with a stride of size 2. (Parameterised Leaky ReLU function initialised to 0.2) was applied to improve model fitting [7] and weight normalization was also used to improve stability [8]. The higher level layers are left as standard deconvolution layers so as to provide refinement for the network and can be reused between different combinations of deep layers. Meanwhile different combinations of deep layers are available for covering the different distributions in the training dataset. An architecture that could achieve the same effect as Stochastic Ensembling is to split the generator into an ensemble of generators with shared upper layers. This increases the size of the networks requirements making it computationally expensive to train. On the other hand stochastic deconvolutions create different ‘routes’ through the use of only different filters in deep layers. From an intuitive point of view, the combination of paths covers different visual “topics” in the training distribution, for which high-level features are usually shared. This prevents the network from early collapse and describes the distribution more effectively. It does not guarantee that GANs based on stochastic deconvolution do not suffer from mode collapse, but does provide some redundancy. Even if each route of 4096 in the example above collapsed it would still provide some variety. Another benefit of using stochastic deconvolutional layers is that the size of filters can be kept smaller. This enables the discriminator to outperform each sub-generator and in the worst case scenario the sub-generator will start oscillations without experiencing hard mode collapse. We believe that stochastic ensembling can be beneficial outside of GAN models and can be useful for any problem that involves generative models. The approach could offer an avenue of further research for the application in non-generative models as an easily implemented alternative to other ensemble techniques. Random uniform (-1, 1) 8x8 ... deconv, 3, output size sdeconv, 576, 16x16
منابع مشابه
A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA
In recent years deep learning algorithms have shown extremely high performance on machine learning tasks such as image classification and speech recognition. In support of such applications, various FPGA accelerator architectures have been proposed for convolutional neural networks (CNNs) that enable high performance for classification tasks at lower power than CPU and GPU processors. However, ...
متن کاملStochastic reconstruction of an oolitic limestone by generative adversarial networks
Stochastic image reconstruction is a key part of modern digital rock physics and materials analysis that aims to create numerous representative samples of material microstructures for upscaling, numerical computation of effective properties and uncertainty quantification. We present a method of three-dimensional stochastic image reconstruction based on generative adversarial neural networks (GA...
متن کاملAutomatic Colorization of Grayscale Images Using Generative Adversarial Networks
Automatic colorization of gray scale images poses a unique challenge in Information Retrieval. The goal of this field is to colorize images which have lost some color channels (such as the RGB channels or the AB channels in the LAB color space) while only having the brightness channel available, which is usually the case in a vast array of old photos and portraits. Having the ability to coloriz...
متن کاملC-RNN-GAN: Continuous recurrent neural networks with adversarial training
Generative adversarial networks have been proposed as a way of efficiently training deep generative neural networks. We propose a generative adversarial model that works on continuous sequential data, and apply it by training it on a collection of classical music. We conclude that it generates music that sounds better and better as the model is trained, report statistics on generated music, and...
متن کاملmixup: BEYOND EMPIRICAL RISK MINIMIZATION
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple lin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.02436 شماره
صفحات -
تاریخ انتشار 2018